24 research outputs found

    Multimodal Legal Information Retrieval

    Get PDF
    The goal of this thesis is to present a multifaceted way of inducing semantic representation from legal documents as well as accessing information in a precise and timely manner. The thesis explored approaches for semantic information retrieval (IR) in the Legal context with a technique that maps specific parts of a text to the relevant concept. This technique relies on text segments, using the Latent Dirichlet Allocation (LDA), a topic modeling algorithm for performing text segmentation, expanding the concept using some Natural Language Processing techniques, and then associating the text segments to the concepts using a semi-supervised text similarity technique. This solves two problems, i.e., that of user specificity in formulating query, and information overload, for querying a large document collection with a set of concepts is more fine-grained since specific information, rather than full documents is retrieved. The second part of the thesis describes our Neural Network Relevance Model for E-Discovery Information Retrieval. Our algorithm is essentially a feature-rich Ensemble system with different component Neural Networks extracting different relevance signal. This model has been trained and evaluated on the TREC Legal track 2010 data. The performance of our models across board proves that it capture the semantics and relatedness between query and document which is important to the Legal Information Retrieval domain

    An approach to information retrieval and question answering in the legal domain

    Get PDF
    We describe in this paper, a report of our participation at COLIEE 2016 Information Retrieval (IR) and Legal Question Answering (LQA) tasks. Our solution for the IR part employs the use of a simple but effective Machine Learning (ML) procedure. Our Question Answering solution answers "YES or 'NO' to a question, i.e., 'YES' if the question is entailed by a text and 'NO' otherwise. With recent exploit of Multi-layered Neural Network systems at language modeling tasks, we presented a Deep Learning approach which uses an adaptive variant of the Long-Short Term Memory (LSTM), i.e. the Child Sum Tree LSTM (CST-LSTM) algorithm that we modified to suit our purpose. Additionally, we benchmarked this approach by handcrafting features for two popular ML algorithms, i.e., the Support Vector Machine (SVM) and the Random Forest (RF) algorithms. Even though we used some features that have performed well from similar works, we also introduced some semantic features for performance improvement. We used the results from these two algorithms as the baseline for our CST-LSTM algorithm. All evaluation was done on the COLIEE 2015 training and test sets. The overall result conforms the competitiveness of our approach

    Rifampicin resistance among patients with Tuberculosis at the Olabisi Onabanjo University Teaching Hospital, Sagamu

    Get PDF
    Background: Tuberculosis (TB) is a major public health problem in Nigeria. The emergence of multidrug-resistant Tuberculosis poses a threat to global Tuberculosis control and if not effectively addressed, may wipe out the achievements of previous efforts in controlling Tuberculosis. Objectives: To determine the prevalence and factors associated with rifampicin resistance amongpatients receiving care for TB at the OlabisiOnabanjo University Teaching Hospital, Sagamu. Methods: A retrospective study of presumptive Tuberculosis cases managed between January 2013 and December 2016 at the Directly Observed Treatment clinic, OlabisiOnabanjoUniversity Teaching Hospital Sagamu, Ogun State, Nigeria,was done. One sputum sample was obtained from each patient for the Gene Xpert® test to diagnoseTB and to determine rifampicin resistance among patients with confirmed Mycobacterium tuberculosis infection. HIV screening was also carried out on all the patients using HIV Rapid Test kits. The sociodemographic data were retrieved from the presumptive Tuberculosis register. Results: A total of 1572 presumptive TB patients were screened for TB, out of which 187 (11.8%) were confirmed to be infected with Mycobacterium tuberculosis (MTB). A total of 20 (10.7%) of the 187 MTB patients had rifampicin resistance using Gene Xpert® method. Rifampicin resistance rate was significantlyassociated with re-treatment TB category but not with age, sex or HIV status. Conclusion: The study showed rifampicin drug resistance among confirmed TB patients. There is a need to decentralizethe use of Gene Xpert® test for TB to the peripheral facilities and make it a point of care test for presumptive TB patients

    A machine learning approach to detecting fraudulent job types

    No full text
    Job seekers find themselves increasingly duped and misled by fraudulent job advertisements, posing a threat to their privacy, security and well-being. There is a clear need for solutions that can protect innocent job seekers. Existing approaches to detecting fraudulent jobs do not scale well, function like a black-box, and lack interpretability, which is essential to guide applicants’ decision-making. Moreover, commonly used lexical features may be insufficient as the representation does not capture contextual semantics of the underlying document. Hence, this paper explores to what extent different categorizations of fraudulent jobs can be classified. In addition, this paper seeks to find what type of features are most relevant in classifying the type of fraudulent job. In this paper, we develop and validate a machine learning system for identifying identity theft, corporate identity theft and multi-level marketing amongst fraudulent job advertisements. We utilized four classes of features: empirical rule set-based features, bag-of-word models, most recent state-of-the-art word embeddings and transformer models for various machine learning classifiers. The machine learning models were validated by evaluating them on a publicly available job description dataset. Our results indicate that the word embeddings and transformer-based features consistently outperformed the handcrafted rule-set based features class. Ultimately, a Gradient Boosting classifier with a combination of empirical rule-set based features, parts-of-speech tags and bag-of-words vectors achieved the best performance with an F1-score of 0.88

    A supervised keyphrase extraction system

    No full text
    In this paper, we present a multi-featured supervised automatic keyword extraction system. We extracted salient semantic features which are descriptive of candidate keyphrases, a Random Forest classifier was used for training. The system achieved an accuracy of 58.3 % precision and has shown to outperform two top performing systems when benchmarked on a crowdsourced dataset. Furthermore, our approach achieved a personal best Precision and F-measure score of 32.7 and 25.5 respectively on the Semeval Keyphrase extraction challenge dataset. The paper describes the approaches used as well as the result obtained.

    Text segmentation with topic modeling and entity coherence

    Get PDF
    This paper describes a system which uses entity and topic coherence for improved Text Segmentation (TS) accuracy. First, the Linear Dirichlet Allocation (LDA) algorithm was used to obtain topics for sentences in the document. We then performed entity mapping across a window in order to discover the transition of entities within sentences. We used the information obtained to support our LDA-based boundary detection for proper boundary adjustment. We report the significance of the entity coherence approach as well as the superiority of our algorithm over existing work

    Multilingual Legal Information Retrieval System for Mapping Recitals and Normative Provisions

    Get PDF
    This paper presents a multilingual legal information retrieval system for mapping recitals to articles in European Union (EU) directives and normative provisions in national legislation. Such a system could be useful for purposive interpretation of norms. A previous work on mapping recitals and normative provisions was limited to EU legislation in English and only one lexical text similarity technique. In this paper, we develop state-of-the-art text similarity models to investigate the interplay between directive recitals, directive (sub-)articles and provisions of national implementing measures (NIMs) on a multilingual corpus (from Ireland, Italy and Luxembourg). Our results indicate that directive recitals do not have a direct influence on NIM provisions, but they sometimes contain additional information that is not present in the transposed directive sub-article, and can therefore facilitate purposive interpretation
    corecore